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Abstract 

Forth is not known in mainstream computing, because the reverse 
polish notation is difficult to learn. To overcome the problem, a Forth 
language simulator is described which makes it easier for newbies 
to play around with a multistack-machine. The idea is to reduce 
the performance down to 1 instruction per second so that the user 
can observe in a singlestep mode what his program is doing. 1 The 
question which remains open is how to program in Forth itself. The 
development of stack based algorithm is outside the scope of this 
paper, here is only a simulator given, which is able to execute existing 
Forth code. 
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1 History 

1.1 Short history of stackmachines 

Charles L. Hamblin published in 1957 a paper about stackmachines 
which was later realized in the KDF.9 computer. It was the first 
stackmachine who was ever build. In the late 1960s Chuck Moore 
invented the Forth programming language. Forth was first used as a 
programming language for existing computers and in the 1980s dedi¬ 
cated Forth chips in hardware were build, for example the RTX2010. 
The 1990s and later was dominated by private research. :The first 
Forth user groups were founded and the current development is, that 
amateurs are using FPGA hardware for implementing a stackmachine. 

From a non-Forth perspective the question is, what is so special 
about a stackmachine? Can this technology be used to make comput¬ 
ers more efficient? Stackmachines are usually engineered not from 
a user perspective but from a hardware point of view. The architec¬ 
ture allows to minimize the transistor count and make the design 
very clean. The negative aspect of Forth is, that it is not possible to 
run normal highlevel languages like Pascal, C or C++ on this. So 
stackmachines are useless for mainstream computing. 

If we are observing the Forth literature, many attempts were un¬ 
dertaken in the past to port C compilers to stackmachines. In most 
cases not very successful. The problem is described in the literature 
as compiler-friendly CPU. Or better, only a CISC CPU is compiler 
friendly. That means, mainstream computing hardware is designed 
for the needs of the programmers and endusers, not for the needs 
of the machine itself. Another disadvantage of Forth CPUs is, that 
even somebody is able to convert existing c application to Forth, the 
algorithm is not very efficient. The reason is, that algorithm like 
sorting has to be implemented for a stackmachine differently. Usu¬ 
ally algorithm are in contrast written for CISC CPUs, they are using 
variables and not a stack. 
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Early compilers Let us go back in the year 1960 in which stack- 
machines were first researched by pioneers. The idea was, that on 
one hand there is a lowlevel machine language and on the other hand 
the need for programming in a high-level language. Today we call 
the gap between them a compiler, in the 1960s the process was called 
“automatic programming”. The idea is not to program in Assembly 
language but in a high-level-language. 

Everybody who is familiar with the reverse polish notation on the 
HP calculator knows, that it is difficult to use. So the reverse polish 
notation was replaced by something which works easier. Todays 
calculator are user friendly, that means, the user is not forced to use 
the RPN notation. In the computer language this innovation was 
called “Fortran”, for formula translator. It improves the programming 
of machines. The programmer were not longer forced to type in 
assembly codes, but can type in their program in normal notation. 
The result of the development of the first compilers was, that the idea 
of stackmachines were no longer needed. A Forth like language is 
similar to Assembly, and with the first compilers there is no need to 
program in Forth. The result was, that Stackmacines were a niche 
topic since that time and instead CISC based cpu dominated the 
business. 

That was no mistake, because the reverse polish notation is indeed 
hard to master and is not helpful if someone is programming huge 
number of codelines. I would suggest that non-RPN calculators and 
non-stackmachines fits better to modern needs in the mathematics 
and programming. But so called register machines haven’t replace 
stackmachines completely. Today, we have both: the obscure RPN- 
notation and the modern c compiler plus CISC cpu. I do not think 
that there is need to decide between them, both domains have their 
advantages. 

The literature from that time is not online-available. Only the 
bibliographic references are stored in the worldcat catalogue. The 
fulltext paper of early stackmachines are not scanned in. But that is 
no big problem, because later authors are citing from the papers and 
a modern paper written in 1980s and later gives the same amount of 
information like the original papers in the 1960s. 

2 Misc 

2.1 What is Forth? 

It is easy to define Forth. It is a virtual machine like the JVM, which 
is programmed in assembly. The VM has the structure of a dualstack 
pushdown-automata and provides commands like “drop, do .” To run 
a Forth program on a computer a concrete realization of the Forth 
VM is needed. Most Forth implementations are written in assembly 
language. But it is also possible to use C and compile this to assembly 
to get a Forth system. 

At least one word to the performance of Forth. A normal Forth likes 
Jonesforth is not very fast. If the user want’s to run a Forth program, 
the VM maps every Forth word to a machine language command. It 
is comparable to a BASIC interpreter. The faster approach would be 
to use a Forth compiler, such compiler are existing, but they are only 
commercial tools like VFX Forth. 

The average Forth VM is slower than what a GCC compiler can 
provide. The question is: why are the people fascinated from Forth 
if the language is difficult to program and slower in the execution? I 
believe, that 99% are using Forth as an education tool. They are not 


interested in writing real applications with it, instead they are using 
Forth like the PL/0 language invented by Nikolas Wirth to understand 
assembly language and compiler development. For the educational 
purpose Forth is well suited. The reason is, that the user is forced to 
become familiar with his computer. For implementing a Forth VM 
from scratch he must understand normal assembly language for the 
6502 CPU, the PIC microcontroller or the AMD64 architecture. 

That means, that a Forth VM can be compared with a JVM or the 
GCC compiler. For programming such software from scratch, the 
user must understand the computer on hardware level. The question 
is not, what feature Forth has, the question is what type of knowledge 
the user must bring in to call himself an Forth guru. For example, 
the famous Chuck Moore can be called an Forth guru not only he 
knows who to write a Hello world program in Forth, but he is familiar 
with assembly language on mainstream computers and even has 
designed his own CPU from scratch, so he is also an expert in chip¬ 
manufacturing. 

The funny thing about Forth is, that it stands not in contrast with 
mainstream computing. I would call Forth not a programming lan¬ 
guage, but a didactic concept. Let us first define what the problem 
is. The problem is, that there is a computing class with students in 
the first semester. And they have absolutely no idea what a compiler 
is, how to design a CPU from scratch or how existing CPUs like 
the AMD64 system works. How do we get in a short period of time 
the students to an expert level? They need a hands-on project in 
which they learn in a short amount of time a maximum of knowl¬ 
edge. And here comes Forth into the game. It is a combination of a 
teaching-programming language, a teaching operating-system and a 
teaching CPU. A possible alternative to Forth is, to use existing “edu¬ 
cational only” projects, for example we can combine the Z80 CPU 
(which is used on most universities in computer courses) with the 
Minix operating system with the above mentioned PL/0 programming 
language. 

Sometimes, Forth is called an easy to learn language. That is 
wrong. Because the problem is not to learn Forth itself, the problem 
is everything what is outside of Forth. That means, understanding 
what a stackmachine is, on a pen&paper computer is easy, but im¬ 
plementing this on a real cpu is hard. Let us imagine an example 
project in a computer class: “Implement a forth dual-stackmachine on 
the 6502 CPU with assembly commands according to the jonesforth 
specification”. It is a clear task-description and it is possible to do so. 
But it is not easy going, it is an advanced topic of computer science 
which has to do with many aspects. 

The misconception of Forth is, that the language is comparable 
to C which means, that after some experience the beginner is able 
to write programs with it. No, he is not. It is not possible to learn 
Forth with the aim to use it in reality. Instead Forth is some kind of 
never-ending-curriculum. The idea is not that the learning progress 
ends someday, the idea is to go deeper into the subject. If the student 
is able to write a “Hello world” program in Forth, the next task is to 
write his own Forth VM. And if he has done so, the next task is to 
make his own Forth CPU. 

Let us investigate how Forth is used in reality. Most companies 
who are developing Microcontrollers are using Forth in daily life. 
They have programmed their own Forth IDEs or they buy the license 
of VFX forth. Why are they doing so is simple. Because companies 
have employees and employees must be teached. The first thing what 
the new employee in a company has to do is to attend a course about 
programming and hardware design. And the lingua franca in this 
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courses is what? No, it is not C, because C can be used without 
understanding the language. Most c programmers now nothing about 
compiler programming itself. They know the language itself, and 
thats it. But, a microcontroller company needs employees who are 
familiar with lowlevel programming, and knowledge in C is not 
enough. 

The problem for what Forth is used, has nothing to do with pro¬ 
gramming or CPU design itself. It is more a social problem. The 
question is how to train lots of employees in a short amount of time to 
a deep understanding of technology. Forth is the answer. Let us take 
an example. Intel has produced a new chip generation with around 
1000 opcodes. The chip is working great. So Intel has no problems, 
right? No, Intel has a lots of problems, because somebody in the com¬ 
pany must understand how the system works. The knowledge about 
Intel chips is not public available, because it is a commercial product. 
Only the employees of Intel are aware of it. And the employees are 
fluctuating. The question is not, how to program the intel cpu. the 
question is how to teach new employees of Intel in a short amount of 
time. 

Forth itself has nothing to do with Intel cpus, it is a fictional virtual 
machine invented for no concrete purpose. But Forth is the right 
choice if around 1000 new employees have to be trained to become 
experts for Intel assembly language. Implementing a Forth VM on an 
Intel CPU is the first project, what the students get. They learn all the 
commands of the hardware. They reverse engineering a system which 
is already there. And after the project is over, they have acquired the 
knowledge. 

I do not believe, that we see in future any program which was 
written in Forth, or any CPU which was designed as a stackmachine. 
What we see is, that Forth is used to increase the knowledge of people. 
And this people will write normal C compilers and create normal 
CISC CPUs. 

Forth as universal assembler Sometimes, Forth is described as 
an universal assembly language, which runs on all computers. A pro¬ 
gram, written in Forth for the C-64 CPU runs without modification on 
an x86 CPUs and runs also on a Microcontroller. But this recognition 
is wrong. Instead, the real portable Assembler is the C language. C is 
a programming language used in real projects, for example to write 
the UNIX operating system which is portable between all computers. 
Instead, Forth can be seen as an educational language. It is not a 
portable assembler, but a subject in a computer-science class. That 
means, Forth fits very well in existing university like structures which 
are teaching knowledge and evaluate if the students have understand 
it. 

2.2 Forth is the future 

According to the TIOBE Index, nobody is interested in Forth. Never¬ 
theless is Forth the language of the future. How can fit this together? 
As an introduction I want to cite the famous Computer Chronicles 
episode of 1984 2 which presented Forth in television. Since this time, 
more than 30 years are gone, and the episode looks very fresh. I 
would suggest, that from now in 30 years in the year 2048 the Forth 
programming language will also be fresh and new. It is a timeless pro¬ 
gramming language and the need for teaching Forth has something to 
do with people. That means, in every time there are people who want 

-https://www. youtube.com/watch?v=D5osk91rGNg 


to know what a stackmachine is, and how a minimal operating system 
looks like. Forth is timeless like the MMIX CPU. It was constructed 
for eternity. 

Let us imagine how a potential future of Forth will look like. What 
will not happen is, that Forth can replace C or C++. The reason is, 
that Forth is difficult to program and slow in execution. Especially, if 
the Forth Virtual machine was programmed by beginners it is very 
slow on x86 hardware. But Forth has a famous standing in computer 
education. That means in a setup in which a female or male professor 
is explaining to the students what Forth is, and let them work on Forth 
related project. Forth is some kind of artificial language which is not 
used in reality, but in computer courses at the university. And i would 
suggest, that Forth is for that task superior to other artificial languages 
like Java, PL/0 or Haskell. If the students should learn a language 
which is not spoken anywhere else, apart from their computercourse. 
Forth is the perfect choice. It is minimalistic and it is possible to 
explain with that language every aspect of computer science like 
compilers, cpu-design, transputer, robotics and algorithm. 

The assumption is, that these subjects are relevant in the future, 
and there is also a need for teaching the domains to students. So 
Forth will have a wonderful outlook. I would suggest, that good 
computer courses in the next 100 years all using Forth. On the other 
hand. Forth is dead for practical applications. It is not possible to use 
Forth as a replacement for Intel x86 CPUs or as a replacement for 
C++. Perhaps we can describe the problem on the Minix vs. Linux 
debate. Minix is a teaching operating system. It is minimalistic and 
can be used in computer courses. While Linux is a commercial grade 
operating system which can be used on real hardware in mission 
critical systems. Writing an academic paper about Linux is a bit 
difficult, because it is changing all the time. Instead a paper about 
Minix makes sense, because it is so static and timeless. 

2.3 What is a good Forth implementation? 

Forth implementation which runs on real hardware are many out 
there. I personally prefer Gforth, because it is in the Fedora package 
manager available but any other Forth system may also be working. 
The main problem with today’s Forth implementations is, that they 
are programmed for real usage. The idea is to start gforth, type in 
some Words and get a turnkey application. This workflow doesn’t 
fit to the need of the audience. If somebody wants to implement 
an algorithm or playing around with a microcontroller a normal C- 
compiler gcc is much better. That means, the code runs faster, and 
the code is easier to program. 

The needed feature set for a Forth system is different from what 
the user wants to do with C. A forth system must support the learning 
progress of his user. It should be constructed as a serious game with 
the aim to provide knowledge. A Forth system is equal to a Forth 
“computer based training’’. In the optimal case it consists of a pdf 
book plus an interactive simulator for trying out the knowledge. The 
simulator has not the obligation to run Forth code exactly or with 
maximum speed, but with visual feedback. That means, it should be a 
Forth simulator, like an Assembly simulator. The most sophisticated 
computer based learning software is called “Softsim” and is part of 
the GA144 cpu. But it goes not far enough. My own Forth system 
has more similarities to a game. 

The idea is to program the software not against a certain type of cpu, 
or use some high-end-compiler optimization. Instead the Forth engine 
is equal to a game engine. It is primarily a visual application which 
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Figure 1: Forth simulator 


prints out the graphical stack and the user can move the instruction 
pointer with keys. The simulator is not ready, it is only a prototype. 
The idea is, that the user can run the game in single step mode or in a 
automatic mode, which cycles through the entire Forth program. The 
Forth system does not exist under a operating system or on a cpu, it 
is written in C++ and can be compared with a minecraft CPU. That 
means, the op-codes are doing nothing, they are interpreted from the 
game, but not by a real cpu or by the Linux operating system. 

Let us investigate who usually a Forth system is intended for. Most 
existing Forth systems has the idea, that the user is programming 
a real machine. For example, he runs the Forth under his x86 PC, 
on a microcontroller or on FPGA. The idea is perhaps, that the user 
is a programmer and wants to try out Forth in real life for making 
something useful with it, for example let a LED blinking or write 
an algorithm. I think, the user has no interested in using Forth for 
programming or trying out new hardware. I think, the user want’s to 
understand Forth on a theoretical basis without programming in the 
language itself. 

The reason why the user think so, has to do with complexity. Forth 
in real life is very complicated. To program a working a Forth system 
or write sourcecode in Forth the user must have a deep knowledge 
of operating systems and hardware. And if an error is happening, 
in all cases the user has a lack of knowledge. For example, he is 
testing out the new eForth system on a microcontroller because he 
want’s to let the LED blink. The problem is, that something goes 
wrong and the reason can be anything. Even the simple task of let the 
LED blink is too demanding for a new student. What he want’s is a 
clean environment which is observable. In the above screenshot of 
my SFML based Forth,, no hidden cpu, microcontroller or software 
is there. The system consists only of the GUI what the user sees on 
the screen. That means, the Forth has really only two stacks and no 
virtual stacks or something else. 

From a technical perspective this GUI software may look like not 
sophisticated. Because it can nearly nothing and is not comparable 
to a full blown gforth or the eForth system. But it has a different 
purpose. The aim is to be an educational tool only. 

The question remains, what the user can do with the Forth simu¬ 
lator. Answer: he can use it for educate other people. Forth is not a 
real programming language like C. If someone needs an operating 


system or a programming language he is better suited with Linux. 
The intention of Forth has more to do with higher education and 
theoretical computer science. It is a subject of academic papers, but 
not of working systems. 

I want to tell also the problems of the Forth simulator. My program 
is not able to read a VHDL description. So it is not possible to 
implement a Forth CPU on the hardwarelevel and simulate it. Instead, 
the execution engine is a fixed block, realized with if-then-statements 
in C++. Here is enough room for improvements. The problem is, that 
I do not how, how logicgates can be wired together to implement a 
complete CPU. This is usually done in the VHDL file, but it is hard 
to understand. 

The idea of using a simulator is not completely new. There are 
many “Assembly Language simulators” out there, which looks very 
similar to my Forth simulator. But they are simulating a register 
machine, not a stackmachine. The “Marie Simulator” for exam¬ 
ple, is a Java program which simulates a CPU. It is an addon for a 
book about computer programming.[8] Another project is the “Little 
man computer”, which is also a GUI interface for teaching machine 
language. [10] 

A machine simulator, dedicated to a stackmachine, is described in 
[3] The idea is similar to other machine simulators. The program has 
not the intention to support engineers in debugging real programs, 
but to teach students with computer-based-training. 

2.4 Making something useful with a Forth machine 

My own Forth machine simulator runs quite well. The datastack 
and the returnstack is filled with numbers, and it is visible for the 
user, how the game works. But one question remains open - for 
what purpose? Let us first examine, what the Forth machine can not 
do. It is not possible to run any other language than Forth on the 
machine. For example C. In theory it is perhaps possible to write a C 
to Forth compiler, but the system would not working well. Because a 
c-compiler works better on a register machine. The better idea would 
be, to program a Forth machine in the standard Forth dialect. And 
this results into the above question, why should somebody do so? 

Currently, I have no answer to it. As far as I can see from the 
sourcecode, programming in Forth takes longer than programming 
in C. The reason is, that the programmer has to use the stack, if he 
want’s to do some useful operating. And writing a program which is 
using the stack is more complicated than using normal variables or 
objects from a C++ program. 

In theory. Forth is a great language for formulate algorithm in a 
machine independent format, but most algorithm in computer science 
can be formulated in normal Python easier. That means, there is no 
advantage for writing a pathplanner in Forth or implement a longer 
program in it. 

There is another potential reason. Writing a Forth program makes 
sense, if the aim is to improve the simulator. For testing out, if the 
simulator works correctly, we need some longer sourcecode. Another 
reason is, that programs written in Forth are smaller, than in other 
programming language. I would suggest, that learning to program in 
Forth helps the students for program in other languages like C++. 

What else can we do with Forth? Running a Forth program in a 
simulator is a very slow task. In my own simulator, not more than one 
instruction per second is executed, and it is not possible to increase 
the speed very much. In contrast, todays homecomputer are very fast 
devices with a highspeed 2 ghz clock. What programmers can learn 
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from Forth is to write small programs, which need not much cpu 
power. The idea is, to reverse the “Moore’s law’’ into a race to the 
bottom. The question is, how slow can a cpu be made for running all 
the important programs on it. The idea is to use computers in the size 
of the Commodore 64 with the same speed and the same low amount 
of ram for doing useful tasks with it. With normal c programming 
techniques it is not possible. 

Let us stay on the Commodore 64. There was some time ago a 
project called Contiki, which is an operating system for the C-64 
which includes a TCP/IP stack. A nice piece of work. But, program¬ 
ming in 6502 assembly is very complicated, and the sourcecode runs 
only on cpus with the same instruction set. The better idea would 
be to program a Contiki like operating system in Forth, and write 
a Forth machine for any hardware out there. Or let me explain it 
from the other way around. The 6502 cpu is no longer in production. 
And running Contiki on the x86 CPU is not possible, because the 
instruction set is different. Assembly language is the wrong decision, 
if someone wants to program a small operating system. Forth instead, 
is also a lowlevel language but can be executed on any machine. 

2.5 Is Forth an elitist language? 

Bernd Paysan wrote that Forth is a language for the hacker elite: 

“So IMHO Forth is an elitary language, designed and writ¬ 
ten for the best, or gifted, not for the masses.” 3 

And he is right. Tthe number of people who are using Forth is small. 
Stackoverflow for example has under the tag “[Forth]” not more 
than 189 questions online. 4 While C++ for example has around 500k 
questions. Or to explain it the other way around, if a newbie is posting 
one single question about Forth, for example “Hi, I’ve installed gforth 
but my hello world app doesn’t work” he has increased the number 
of worldwide question about the subject with 0.5%! 

But why is the Forth community so small? It has to do with the 
costs. It is difficult for the people to get information about Forth 
or buy Forth related hardware. Forth can be seen as an obscure 
programming language which is not widely known. In contrast, the 
educational programming language “little man computer" is teached 
on many universities worldwide. Many simulators and user groups 
are out there which are researching in detail how to program this 
simple computer. 

I do not think, that Forth itself is the problem. Sure, programming 
in Forth is not easy, but the language is not more complicate than the 
assembly syntax of LC-3 (another educational language). I think the 
bottleneck is somewhere else. The number of introductory books is to 
low and books which are available for example [5] are not outdated. 
The book was published 27 years ago, it is a great book but can’t be 
recommended anymore. 

2.6 Forth ressources 

The major problem in the Forth community is to find information. At 
first, the number of papers is small, and many of them are behind a 
paywall. Second, the information are often not logical, that means, 
author 1 contradicts author 2. So the overall knowledge about FORTH 
is decentralized, and not easy accessible. Overcoming the problem is 
simple: emulate Forth in a sandbox. 

’https://beriHl-paysan.de7vvhy-forth.html 

4 https://stackoverflow.com/questions/tagged/forth 


I mean. Forth itself is very complicated. If the user is trying to 
program a Forth virtual machine for a certain type of processor for 
example the x86 CPU he increases the chaos. The better way is not 
to focus on Forth itself, but to program a game, in which a Forth 
computer is simulated. The idea is to see Forth as an educational 
programming language, like the Little man computer. The conse¬ 
quence is, that the problems are different. Because there is no longer 
a Forth system which runs somewhere, but instead there is only the 
Forth simulator which is written in C++ and the aim is to improve the 
simulator. 

Inside the simulator it is possible to run a forth program, for ex¬ 
ample “12 +”, but this is not the important part. The idea is not to 
program in Forth, but to program the environment for Forth. The 
advantage is, that from day 1 the overall system is working, even 
if the Forth interpreter is not able to do sophisticated things. For 
example, my own Forth simulator is very slow (1 instruction per 
second), and can only execute 3 commands. That means, the system 
is not able to interpret a standard Forth program. But that it is not 
a problem, because the simulator itself runs stable. That means, the 
C++ code compiles, something is shown on the gui, and the user can 
move the instruction pointer. 

The nice feature is, that the overall problem is under control, be¬ 
cause the C++ language and the aim of writing with it a game is well 
researched. That means, the user has no longer questions about Forth 
direct, but he is trying to improve a normal C++ program. That means, 
it is ok, if he doesn’t understand Forth, because he is not interested 
in programming the language. Instead, Forth is implemented like a 
Assembly game. It is something which runs inside another program. 

The first critics was, that it is a waste of time for writing a graphical 
simulator, if a normal Forth interpreter runs much better. And indeed, 
current Forth systems like Jonesforth, Javascript Forth and Gforth are 
very fast and can execute all Forth statements. The problem with all 
these programs is, that they are forcing the user in a certain situation. 
If someone starts the Jonesforth interpreter, all what he can do is enter 
Forth code. And because most users doesn’t know how to program in 
Forth, they don’t like it. From a teaching perspective, it is better to 
force Forth into a certain position inside a clean environment. That 
means. Forth is only a textfile which is executed in a much broader 
system. And the broader system (the simulator) is written not in Forth 
but in C++ and has a nice GUI. 

According to the given literature this concept is very new. In the 
Forth community itself, no other project was done into that direc¬ 
tion. The idea of using simulation games for teaching a language 
is only used in the mainstream computing, for example the Little 
man computer project. Let us investigate this project in detail. The 
assembly community knows two types of users. Hardcare program¬ 
mers, which are using NASM like assembler for programming real 
programs. For example, they are writing a new intro, and they are 
doing so since many years. On the other hand we have Assembler 
simulations. That are programs which are compared to the NASM in 
a weaker position. The compiled sourcecode runs slow, and often it is 
not possible to make any useful things with it. So called “Assembly 
language simulators” are used by newbies to learn the language. The 
common feature are, that the system is per default in the singlestep 
mode, has a graphical interface and additional lots of documentation 
which explains what Assembly is. 

The Forth community has currently only one type of user: the 
hardcore programmer which is using Forth since 20 years, has pro¬ 
grammed his own FPGA and is fluent in programming Forth. That 
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Figure 2: Forth simulator 

makes it very hard for newbies to entering the community, because 
the tools which are there use are not very beginner friendly. Even 
the lowlevel entry Forth “Jonesforth” is a tool for experts. It has no 
graphical interface, mns Forth code very fast and is used for creating 
Forth apps from scratch. That means, Jonesforth is not focusing on 
beginners. 

Let us take a look into figure “Forth simulator’’ which shows the 
GUI screen. It is not a console app. but visualize the current Forth 
system. It has a datastack, a returnstack, RAM and an execution unit. 
Describing the system is simple: it was created with the SFML library 
in C++, that means it is a game comparable to pacman. The user can 
send some keystrokes to the simulator and the game answer with a 
certain behavior. The rules of the game are called Forth, that means 
it works like a stackmachine with two stacks. From a technical point 
of view, the system is not so powerful like Gforth or Jonesforth. It 
is comparable to an Excel chart and is not a real Forth system. The 
advantage is, that the user is supported in his learning experience. 
Even without understanding Forth the user can press some buttons 
and see how the system works. 

Currently, the simulator has only a few actions. The user can 
press up. down, left and right. This moves the instruction pointer 
and executes the command. The simulator works in a way, that it 
shows what is inside the stack. That means, the debugging mode is 
activated as default. According to the specs, the Gforth system has 
also a singlestep debugger. But it is not very pleasant. The same is 
true for the SwiftForth. Both debugger are created with low priority, 
in the advance that the user knows the language already from the 
textbook. But this is not true. The user needs a app because he want’s 
to learn Forth. 

2.7 Writing software in Forth 

According to most papers about Forth, the programming is done with 
a divide and conquer technique in which a problem is described with 
subwords. Most newbies are not understanding the idea and ask for 
an easier way of implementing Forth code. My impression is, that the 
language which is nearest to Forth is the assembly language. Coding 
for old 8bit Atari computers and Forth has much in common. Sure, 
Forth is something which goes way beyond assembly, because it can 
run on any machine and it is possible to design special cpu only for 
Forth, but the programming itself is the same. 

Let us investigate a small example how a program is created from 
scratch. Suppose the user wants to program a bubblesort sorting 
algorithm. The first step what he is doing is to ask google if someone 
else has implemented it. Maybe he finds a out-of-the-box solution on 


http://www.rosettacode.org or in one of the Forth forums. Because 
the community is very small in most cases he will not find any code. 
That means he must write the Forth code from scratch. The easiest 
way for doing so is not to separate the problems into words, but 
asking the assembly language community for help. The reason is, 
that programming on a lowlevel for a certain cpu type is done both: 
in assembly and in Forth. 

Let us explain who a typical assembly program works. Usually 
it has no variables but reserve space in memory. Usually, the pro¬ 
grammer is using subroutines, and usually the programming is done 
different from programming in C. I would suggest, that programming 
in Assembly for the Atari 8bit cpu takes around the same time than 
programming for the abstract Forth virtual machine. That means, the 
user must build up the complete program from scratch. Either Forth 
nor assembly has any predefined libraries or highlevel constructs like 
objects. Instead it is up to the programmer to program bare metal. 

The good news is, that the assembly community has researched 
over many years how to program large amounts of sourcecode. Usu¬ 
ally the learning curve is high, and it takes many weeks to program a 
game in assembly. And if an error occurs the debugging is time con¬ 
suming. The same is true for Forth programming. It is like assembly, 
only that the cpu has fewer registers. 

disadvantages Let us describe the problems in Assembly and Forth 
together. The main disadvantage is, that it is a lowlevel language. 
That means the cpu is executing only the given commands and in a 
certain order. In contrast, C++ sourcecode is structured more like 
a textfile. That means the exact order of the statements is not very 
important instead the programmers describes general structures. That 
means, larger piece of C++ code can be written without feedback 
from the compiler. Instead Assembly and Forth programming is pro¬ 
grammed against the bare metal. That means that every punctuation 
is important, because otherwise the system won’t work. And i would 
suggest that most of the time the programmer is working in the de¬ 
bug mode, that means he is using a tracer to follow command after 
command for verifying that the assembler is doing the right thing. 

Let us describe a typical bug. A subroutine is not doing what was 
expected. But where is the error? The assembler programmer has 
to go step by step through the commands and observe the content 
of the registers. For example he is recognizing that the accumulator 
has the wrong value, he traces the problem back to the command in 
the listing and modifies the sourcecode. The same error repairing 
mechanism is used by Forth programmers. The only difference is, 
that they are not documenting what they are doing. But in reality. 
Forth programmers are only special kinds of assembly programmers. 

Now a typical example how equal Assembly and Forth are. In a 
stackoverflow thread somebody asks for help because he wants to 
initialize a struct in assembly language. 5 He describes the problem 
very well and has also a screenshot of his assembler. For c program¬ 
mers the question seems very complicated, but it is the normal how 
assembly programming is done. The interesting point is, that the 
question is valid for Forth too, that means in a Forth programs the 
problems for initialize memory for a stmct are equal. 

I mean, either Forth nor assembly have any kind of datastructure, 
predefined way for accessing memory or standard libraries. What 
they are doing is working bare metal. That is the reason why pro¬ 
gramming in assembly takes so long, while programming in C is 

5 https://stackoverflow.com/questions/23547328/following-struct-pointers 
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faster. The difference is, that assembly programmers work on hard¬ 
ware which can be used by a c compiler as an alternative, while Forth 
programmers have not the option to fallback to C, because on Forth 
chips modern C compiler aren’t working anymore. 

Here another problem. .The task is to reverse a string. According 
to Roessate code the LC-3 assembly language can used for the task. 6 
The listing contains words like SCAN, COPY, COPIED which have 
subactions for manipulating the registers. The Forth solution which is 
also given on the website is using also words (pushstr, popstr, reverse). 
It is nearly the same solution, except that Forth is bit more academic 
than the primitive LC-3 command set. In both cases, the sourcecode 
is not very easy to read. It takes for a beginner at least 10 minute to 
understand it. In contrast the reverse string task can be solved with 
normal C++ way more easier. 

As a conclusion it makes no sense to compare Forth with C/C++. 
The languages are to different. But it makes sense to compare Forth 
with X86 assembly and LC-3 assembly. And I would go a step further 
and suggest that Assembly experts are also experts for Lorth. 

Let us answering the question how a complete operating system 
in Forth has to be programmed. Taking existing Linux sourcecode 
which is written in C and convert this to Forth makes no sense. The 
languages are very different, it is another culture. But, it makes sense 
for orientation on existing assembly OS, for example menuet OS. 
The sourcecode there can easily be transferred to Forth. It is the same 
programming style, and the same bare-metal mentality. This can 
be seen very clear, if we are comparing the debugging screen in an 
Assembly environment with the debugging screen of Forth. In both 
cases the programmer is going in singlestep through his application 
and traces back the register / stack values for identifying a mistake. 

The example MenuetOS consists of 1.5MB sourcecode. This 
shows, that it is possible to write larger projects on bare metal. I 
would suggest, that also a Forth operating system in the same size is 
possible. 

2.8 Forth emulator 

Before I want to describe the situation for Forth, a short look into 
the 8-bit retroscene which is using Assembly as their main language. 
What happens since the Commodore 64? A lot. If we are searching 
github for the term 6502 emulator, we will see, that at least 100 
projects are out there. Some emulators are written in C, some in C++, 
Java, Javascript and C#. The motivation behind such projects is the 
idea to learn something. Often the programmer have read the 6502 
manual and implements the function in a high-level programming 
language like C++. That means, they are building a computergame 
which has registers and an Accumulator. The Line of Code is remark¬ 
able low what a 6502 emulator needs. Some projects have not more 
than 2000 LoC and they will run assembly code, which was written 
for the C-64. 

The other way around is not very often done. I mean, to write in 
6502 assembly a C++ compiler. In theory, this is also possible, but I 
found no project in that direction. The reason is perhaps, that writing 
in 6502 assembly language is more complicated then writing in C++. 

Now we will come to the situation in Forth. My own project was 
equal to the above cited 6502 emulators. The idea was to use C++ 
and simulate Forth. This task was surprisingly easy. The reason is, 
that C++ is a very powerful language which can simulate everything: 

6 http://www.rosettacode.org/wiki/Reverse_a_string#LC3_Assembly 


Forth, 6502 assembly. Finite state machines, neural networks or 
whatever. The other way around was not successful. That means, 
mean knowledge about Forth was zero before the project, and is zero 
after the project. That means, it is very easy to write a game in which 
Forth can be executed, but it is hard to write Forth itself. This is the 
same situation like in the 6502 emulator case. 

Let us increase the difficult a bit. How complicated is the task 
to implement the ANS Forth standard with a C++ program? Not 
very sophisticated. The specification is available online and writing 
a C++ program which takes Forth code and give the result back is 
easy to create. It is a one man project. If somebody has experience 
in compiler writing he will mastering it in under a year. The result 
should be a C++ program which is 3650 lines of code and emulates a 
Forth system. 

But how demanding is the other direction? I mean, to use Forth 
for writing a C++ compiler, or to use Forth for writing a game? It is 
way more complicated. The reason is, because Forth is complicated 
like Assembly, and not very good documented. I would guess, that 
even a computer expert will not have a prototype after a year of 
programming and it is unclear if his Forth program will ever run. 

Let us increase the abstraction level a bit. .The first fact is, that 
writing cpu-emulators and compilers for languages is surprisingly 
easy. The number of github projects from that area is high and the 
number of needed lines of code is small. Easy means, that such 
projects can be done in a short period of time as a one-man-project 
and will result in a success. The second remarkable fact is, that the 
task is usually done with modern programming languages like C#, 
C++ or sometimes C. 

Another interesting fact is, that after writing such an emulator, in 
most cases the programmer has not learned the underlying system for 
example the MOS 6502 and can it program fluently, but what they 
have learned is the tool-language C++ which was used to write the 
emulator. That was also true for my own project. The knowledge 
about C++ has increased a little bit, while the knowledge about Forth 
remains on zero. 

Let us take a look at a typical “6502 emulator project”. According 
to the list 7 one MOS6502 emulator 8 was written in C++ and has 
passed the “Klaus Dormann’s test suite”. I didn’t compile the emu¬ 
lator by myself, but i trust the survey. Let’s take a deeper look into 
the github repository. It contains of 2 files. The .cpp file has 1315 
Lines of code and the header file has 189 lines of code. So we see 
a C++ project with 1500 lines of code. According to the average 
productivity, the programmer has spend 150 days on the project (10 
Loc/day), so it can be called a small nice amateur project. 

I’m explaining this in detail, because in the theoretical literature 
the idea of programming a complete CPU from scratch is described 
as very complex, and the beginner thinks perhaps, that this will takes 
years of development and will end with a failure. But in reality, 
emulator writing is something which can be done as a hobby, and 
it seems that the programmer had a lot of fun with it. The question 
which remains open is, why did he used C++ for writing an emulator, 
why did he not used a Commodore 64 assembler for writing a program 
in Assembly? 

With a bit searching, I’ve found some Emulator projects which 
were done in Assembly. That means, the programmer is using x86 
assembly for writing a 6502 emulator. But, the number of projects is 

7 http://www.6502.org/tools/emu/ 

8 https://github.com/gianlucag/mos6502 
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small, and the sourcecode is difficult to read. For example this one 9 
has 2800 lines of code in x86 assembly. 

Why? The reason why modern 6502 emulators are written in C++ 
and not in 6502 assembly itself has to do with the term legacy. The 
Commodore 64 assembly language is recognized as assembly. That 
means, the people are not longer using it for real projects. That means 
not, that the C64 or the 6502 is obsolete. The opposite is true: many 
retroevents are taking place and it seems, that today more people 
have an C-64 since in the 1980s as the machine was sold in reality. 
The problem is, that the programmers today are only talking about 
the machine and are not longer part of the community. That means, 
there are no real C-64 programmers out there. What they are doing is 
to place the CPU in a basket, simulate it with perfect accuracy and 
writing books about it. But what they are using in reality is a Intel 
CPU and the C++ programming language. 

Can we do the other way around? I mean, using old Commodore 
hardware and 6502 assembly language to emulate modem computers? 
No, it is not possible. Because fixing C++ sourcecode is easier than 
fixing Assembly sourcecode. And a PC is faster than the 6502 chipset. 

I wouldn’t call the C-64 death or obsolete, instead it is a legacy 
system. That means it is a closed system which is used for learning 
and teaching. And I would suggest that the Commodore 64 is the 
better computer for community building, than an ordinary PC. I do 
not know any PC or C++ programming club for example. Apple fan 
clubs are a lot of them, but I never heard of an Intel fanclub. 

2.9 List of Forth CPUs, IDEs and books 

List of FORTH books 

• Elizabeth.D.Rather: Forth Programmer’s Handbook 

• by Elizabeth D. Rather: Forth Application Techniques 

• Leo Brodie: Starting Forth 

• Leo Brodie: Thinking Forth 

• ANS/ISO Forth Standard, http://dl.forth.com/sitedocs/dpans94.pdf 


3.2 Speed of Forth 

So what’s the deal, why is SP-Forth so slow? At first, it is not the 
fault of SP-Forth. Other Forth systems like pforth or gforth are not 
much faster. The problem is, that the C++ compiler was designed for 
practical usage and is optimized for speed. The idea is to generate the 
fastest binary file which is possible. While Forth was designed as a 
teaching language. The aim is not to write code in Forth and execute 
this on hardware, the hardware is to use Forth code in theoretical 
papers without ever run the code on real machines. Forth is used 
mainly as a proof. 

The idea is not to focus on real hardware or real software, but 
invent everything from scratch. A forth program is not created for an 
existing computer like the x86 CPU, but the assumption is, that no 
hardware is there and we must invent a computer from scratch. And 
the minimal computer has only two stacks: datastack and returnstack. 
In contrast, the C++ compiler is progrmmed for real environment. 
The assumption is, that working hardware is out there, and a working 
operating system is available. For C++ it makes no sense to build 
around a virtual CPU, but the idea is to use existing hardware for 
producing software. 

Forth programming can be called a permanent reverse reengineer¬ 
ing. That means not to use the work previously done but to invent 
the wheel from scratch. That means, if someone wants to invent a 
computer from scratch, write his own compiler and invent his own 
operating system from scratch, than Forth is a here to stay. If the aim 
is to use existing hardware/software for writing something new, than 
C++ is better. 

Let us explain this in detail. The assumption under which Forth is 
operating is, that we are living in the 18th century. The first computer 
wasn’t invented yet and all what we have are punch cards and relays. 
The aim is, to built a simple machine from scratch, program the 
first compiler, and write a hello world program which runs. This 
is a job perfect for Forth. It is possible, to build everything from 
scratch: the logicgates, the compiler, the machine model and even the 
programming language. The result will looks a like a computer made 
of wood and has an overall speed of around 1 instruction per second. 
■The memory is limited. Forth is the slowest programming language 


J.V. Noble: A Beginner’s Guide to Forth, 

http://galileo.phys.virginia.edu/classes/551.jvn.fall01/primer.htm 


Where X=forth, 


ever, but it minimizes the amount of understanding of the system. 
Phil Koopman: Stack Computers: The New Wave, Homebrew CPUs projects are not created for the purpose of speed 
http://users.ece.emu.edu/^koopman/stack_computers/index.html or inventing something new. A homebrew CPUs is designed for 

replicating existing work. The idea is for example, to build a 8 bit 
homebrew CPU out of TTL chips. Why would somebody do so? 
The real 8-bit epu was invented 40 years ago, recreating this from 
scratch is done with an educational purpose. To understand how 
the system look like, and to gain knowledge about hardware itself. 
This is the main domain in which Forth is used. Forth is the natural 
language for homebrew CPUs. The resulting working system will be 
run very slowly (because it is an 8 bit epu) and Forth will decrease 
the performance further. The advantage of this bonsai plant is, that 
everything is under glass. That means it is fully documentated and 
can be seen as art. 

A good example for such projects is the “MARK I Forth computer” 
10 The performance of the machine is extraordinary slow, and the 
Forth program run on this machines solves no real problem. Instead 
the project is proof of how to build such computers. 

Let us investigate the MARK I computer in detail. What is his 
purpose? It is not possible to run a Webserver on the machine, it is 


Learn X in Y minutes , 
http s: //le arnxiny minutes. com/doc s/forth/ 

Forth FAQ, http://www.forth.org/faq.html 


3 Performance 

3.1 Power consumption 

• 5 femtojoule per transistor [1] 

• Apple A8X, 3000 million transistors, 4.5 Watt ->l,5*10 A -9 
Watt/transistor 


MuP21 Forth epu, 7000 CMOS transistors, 100 MIPS 


9 https://github.com/scantysnax/bender_x64/blob/master/bender.asm 


0 http ://w w w. aholme. co. uk/Mk 1 / Architecture .htm 



year 

name 

author 

note 

1962 

Burroughs 5000 



1962 

English Electric KDF.9 

Charles L. Hamblin 


1985 

Novix NC4000 

Charles H. Moore 

Novix Inc 

1988 

RTX2010 

Charles H. Moore 

16 bit https://en.wikipedia.org/wiki/RTX2010 

1989 

RTX32p 

Philip Koopman 

32bit, 2-chip 

1989 

RTX4000 

Philip Koopman 

planned commercial version of RTX32p 

1993 

F21 

Charles H. Moore 

500 MIPS , 15000 transistors 

1995 

MuP21 

Chen-Hanson Ting and Charles H. Moore 

2 chip system (second VGA) 

1996 

i21 

Charles H. Moore 

500 MIPS. NTSC video output 

1998 

MSL16 

Philip H W Leong 

similar to MuP21 

1999 

C18 

Charles H. Moore 

asynchronous, SEAforth 40C18 

2003 

bl6 

Bernd Paysan 


2003 

JOP 

Martin Schoeberl 

Java optimized processor 

2004 

MicroCore 

Klaus Schleisiek 


2004 

FC16 Forthcore 

Richard E. Haskell 


2006 

SEAforth-24 

Charles H. Moore 

24 core cpu 

2007 

eP32 

Chen-Hanson Ting 

together with eForth 

2009 

Green Arrays FI 8 

Charles H. Moore 


2010 

J1 

James Bowman 

Willow Garage 

2010 

GA144 

Charles H. Moore 

commercial product 

2012 

N.I.G.E. Machine 

Andrew Read 

softcore cpu 

2015 

Lund CPU 

Girish Aramanekoppa Subbarao 











Table 1: List of Forth CPUs 

google “site:http://www.forth.org/svfig/ cpu” 
http://www.ultratechnology.com/chips.htm 
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Name 

price 


VFX Forth 

1200 US$ 


SwiftForth 

399 US$ 


8th 

130 US$ per year 


Gforth 

GPL 


F-PC Forth system 

free 



Table 2: List of Forth IDEs 


also not possible to run Linux on it. So why invested the inventor so 
much energy in the project? Because he proofed something. A proof 
is known from universities. The idea is to repeat something which is 
already known. For example: everybody is aware that 1+1=2. The 
problem of how to add to number is solved. But instead go further 
and investigating more complicated things the same problem is asked 
as an open question. A possible proof for 1+1=2 looks the following. 

At first the teacher asks a question: What is 1+1? Somebody gives 
the right answer: 2. But that is not the answer the teacher want’s to 
hear. The teacher is not interested in math itself, he is iinterested in 
testing his students. It is important to ask who has given the answer, 
and what his motivation was behind it. Making a proof means, to deal 
with problems, for which the answer is known to build an education 
system. 

With the same purpose the Mark I computer was designed. How 
to program a computer is already known, but repetition is always 
good. The running Mark I computer can be used in a quiz. The 
teacher asks a question, for example, “What should i do to let the led 
blinking?” and the students must answer the question. The idea is 
not to discuss about computers, but build a school game, in which 
people are evaluated. 

Formal Forth 

“Forth provides us with a simple language with a program¬ 
ming environment and debugger. Due to the simple nature 
of Forth, this can be formalised much more readily than 
most other languages (Stoddart 1988)” [4] page 37 

It is a bit unfair to compare Forth directly with C. Because both lan¬ 
guages are made for different purposes. C was made as a quick&dirty 
language for real systems. While Forth was created as an intellectual 
challenge in theoretical computerscience. Forth can be best compared 
with the MMIX cpu of Donald E. Knuth. The “MMIX Visual Debug¬ 
ger” isn’t use for real programming tasks, it is a clean simulator for 
trying out algorithm in the university. The MMIX cpu uses lots of 
register. Forth is more restricted and has stacks. This makes Forth 
superior to the MMIX. It makes sense, to compare Forth with the 
MMIX, PL/0 or other educational computers. But it makes no sense 
to compare MMIX with C or Forth with C. 

Let us take a look into “The art of computer programming”. The 
book-series is equal to a game. It builds a knowledgesystem in which 
questions can be asked which can be answered wrong or true. That 
means, first the author Knuth explains what sorting is, and then the 
students can answer a question to the problem. If he is right, he gets 
a point from Knuth. The book “the art of computer programming” 
is created as a whole big proof. That means, nothing new can be 
found in the volume, but the purpose is to gamify existing knowledge. 
I wouldn’t call the book scientific or academic, it is more didactic. 
The idea is to use the book for teaching students in computer science. 


Teaching means, that the students can answer the question which are 
given by the instructors. Teaching means not, that the students are 
exploring topics outside of the course or find new answers. Donald 
Knuth is a wonderful teacher. He is very good in proofing something. 

Let us take an example excersise from “The art of computer pro¬ 
gramming”: 

“In a random sequence of a million decimal digits, what is 
the probability that there exactly 100,000 of each possible 
digit?” [6] 

“Write a MIX program analogous to (2) that computes (aX) 
mod (w-1).” [6] 

In the book are much of the above cited exercises and they have all 
in common. The answer to the problem is known. At least by Knuth 
or it is printed in the reference section as the correct answer. The 
question are well suited for testing students, if they have understand 
the book. That means, Knuth didn’t invented algorithm theory, he 
invented the SAT test (Scholastic Assessment Test). Somebody may 
call the book from Knuth as mischief, but it is the opposite. It 
shows, what a mathematical proof is, and it can be used in a teaching 
environment. Proof means, to ask for something which is well known. 
It is some kind of riddle which is meaningless by itself, but defines 
the relationship between a person who has the power to educate the 
opponent. The difference is, that Donald E. Knuth can let a student 
fail a test, while the opposite is not possible. That means, Knuth 
creates the test for an exam, while his students must attend the test. 

3.3 Performance 

According to the official Forth FAQ a program written in Forth is 
around 4-8 times slower than the same algorithm written in C. 11 So 
what’s the deal, why are people using this slow language? Let us first 
try out the c-language in detail. If somebody is writing an algorithm 
in C he uses a compiler for producing machine code. Modern C 
compiler are really good in the task, in most cases the compiler 
generated machine code is as fast as handprogrammed assembly 
code. That means, a good c compiler let the hardware running on 
its maximum. Suppose we have written a primenumber algorithm 
in C and with -03 compilerswitch the programs needs on a 2 Ghz 
AMD64 cpu around 100 seconds until it is finished. How we can 
improve the speed? With Forth it is not possible, and with a better 
C compiler also not. The problem is the underlying hardware. If we 
want to run the same program not in 100 seconds but in 1 seconds we 
need a faster cpu. 

And here comes Forth into the game. The idea is to not use small 
optimizing techniques. For example, a better cpu architecture or a 
transputer which has many cpus at once is a big optimizing technique. 
And this is the reason why people are playing around with Forth. The 
idea is to not using small optimizing technique with the factor 5, but 
using big optimizing techniques in the factor of 1000. If we have 
for example a transputer with 1000 cpus at once, our algorithm will 
much more profit from it, than any c compiler can deliver. 

3.4 Multi-core Forth machine 

A possible computer consists of a large amount of main memory, 
for example 100 GB in DDR4-RAM. The memory consists of cells 

u http://www.forth.org/faq/faql.txt section #3.5 
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Figure 3: Two Forth CPUs on one tape 
Left is the datastack which has [5,3] while right is the return stack. 
The user can move the CPUs with pressing the arrow keys. Fie 
can also start the “fetch-decode-execute” cycle which is starting the 
normal Forth inner next loop. 


which containing data and program code. On the memory, many 
Forth CPUs are operating in parallel. Other modules like caches and 
registers do not exists. Let us go into the details: 

The memory has an address space. It starts with $0000 and goes 
up to $FFFF. For example on address $1000 the main program starts. 
The first command is “4”. According to the Forth syntax the number 
is put on the datastack. .The next command on adress $1001 is also 
a number “2”, it is put on the stack. The next command at adress 
$1002 is the plus sign “+”. This adds the two numbers on the stack. 
What the Forth CPU is doing, or better the pointer of the Forth CPU 
which points to the Memory is simple. He is executing commands. 

Now let us increase the complexity and not only use one Forth 
CPU but two of them in parallel. It is not possible that they are 
operating on the same piece of code. This would results into the same 
problem like in normal computing, for example CPU 1 overwrites a 
memoryadress which is needed by CPU2. But it is possible to define 
separate threads which runs in parallel. In the Linux operating system 
this is called a background process. That means, CPU1 is executing 
code from $1000 to $1200 while CPU is executing code from $2000 
to $2040. The threads are separated from each other. 

If the programmer defines 100 threads, then 100 Forth CPUs can 
run simultaneously on the memory. Because each Forth CPU is 
minimal from design it has only 2 stacks: datastack and return stack 
plus some minor features like an instruction pointer. It is a minimal 
computer which consists of not more than 1000 transistors. This 
makes it possible to construct many thousands of Forth CPUs which 
runs in parallel on a large amount of memory. The only bottleneck is 
the bandwith of the RAM. 

The problem is, that RAM can only accessed by one CPU at the 
same time. For example: on timeindex 0.1 seconds CPU 1 is reading 
a cell, on timeindex 0.2 seconds CPU2 is reading a cell and so forth. 
The memory controller manages the access from the cpus to the 
memory. If the bandwith is low, the overall system will be very slow. 
That means we have around 1000 Forth CPUs but they can not access 
the memory at the same time. They have to wait. 

Another problem is, that normal C sourcecode will not run. It is 
not possible to convert existing Linux programs into Forth syntax. 
And even this would be possible, it makes no sense because a sort¬ 


ing algorithm like Bubblesort has to be implemented different on a 
stackmachine. That means, the programmers have to rewrite all the 
software from scratch. This results into new operating system, new 
video codecs and new applications. 


3.5 Why performance benchmarks are useless 

Most amateur programmers love benchmark. They compare for 
example the execution speed of a C++ compiler with the speed of 
Java. The idea is to evaluate a certain technology with an objective 
judge and to be on the side of the winner. In the case of Forth a 
benchmark makes no sense. In many papers about new invented 
Forth software some kind of performance benchmark is given, but 
it says nothing. The reason is, that the intention of Forth is not to 
replace existing CISC CPUs or existing programming languages. If 
somebody want’s to create a Forth benchmark then only with the aim 
to build a very slow Forth system. 

My own Forth language simulator has right now a performance of 
around 1 instruction per second. This performance is reached with an 
artificial pause, which is executed in the thread. The normal SFML 
command was used to wait for exact 1000 ms. The result is, that the 
overall gameloop runs with 30 fps to get the movement smoothly and 
for reacting to user input, while the execution of the simulated Forth 
CPU is slowed down to 1 step per second. Otherwise, the user won’t 
see very much, he would start the program and get a millisecond 
later an error command, because his Forthprogram is wrong. With an 
artificial slowdown the user can see in stopmotion what the simulated 
Forth CPU is doing and how the stack is filled with values. 

My plan is to reduce the execution speed further, down to only 0.5 
steps per second. The reason is, that in a dualcore CPUs two Forth 
VM are executed in parallel, and if both are running with maximum 
speed, it is hard for the user to get both in observation. 

The intention of a Forth benchmark is not to run a given program at 
maximum speed. That is the privilege of production ready program¬ 
ming languages like C++.: They are used for writing real software. 
Instead, Forth is an education programming language, that means a 
slow execution speed is better. A similar project is the GNUSim8085 
software, which is a tool for learning assembly language. The per¬ 
formance of the software is very slow, but this principle is right for 
learning Forth too. 

GnuSim8085 is not the most sophisticated assembler. And most 
feature of GAS and NASM he didn’t have. It is more a teaching envi¬ 
ronment, that means it is not possible to write real assembly programs 
with it, but it is a good tool for playing around with Assembly for 
people who have no knowledge. 

Let us investigate how a dedicated Forth CPUs is presented: 

“MISC processors will be much cheaper than RISC and 
CISC processors, and can compete effectively against them 
on the basis of favorable price/performance ratio.” 12 

According to this statement the MuP21 is superior to normal CPUs 
because he is faster and needs less energy. This argument is not very 
relevant, it misleads the user into the direction, that he can use Forth 
to replace existing computers. If we want to promote Forth, than we 
should argue that Forth is slower and needs more energy. That means, 

12 


http://www.ultratechnology.com/mup21.html 
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a standard x86 PC in CISC architecture is able to run with around 10 
Gflops, while a Forth CPU is for the factor 10 10 slower. The question 
is not how to shrink the size of the gate to compete with existing 
technology, the question is how to build a Forth CPU in hardware 
which is really slow, and how to program a Forth simulator which is 
even slower. 

According to the specs, the MuP21 CPU was produced with 1.2 
micron CMOS process. That is too tiny. What’s about a Forth CPU 
build with a transistor size of around 1 cm? If the chip has 1000 
transistors, the cpu would have a length of 10 Meter, which needs a 
special room for it. The advantage is, that the museum visitor can 
observe the device, walk around and touch it. 

3.6 Forth has a big performance problem 

Some papers about Forth are discussing the performance issue. The 
intention is to make clear, that Forth is a fast cpu architecture which is 
better than other systems. In reality, the performance problem is not 
about speed but it’s the opposite. Perhaps I should give an example.. 
My new Forth language simulator has two CPUs who are working 
simultaneously on the same tape. The user types in the command 
“Run” and both CPUs are start working. It feels a bit like in a realtime 
strategy game, in which the user controls two units at the same time. 
The problem is, that it is hard to observe the scene. Even a artificial 
timelag of 1 seconds is used between two actions, the scene runs too 
fast. The problem is that it is easy to follow one cpu, but two cpus at 
the same time with a separate datastack is too much for the normal 
user. 

How to overcome the problem is unclear, but increasing the per¬ 
formance is the wrong way. The better answer has to do with more 
timelag. Perhaps I should increase the waiting time from 1 second to 
10 second, so that the user has time to see what exactly the stackma- 
chine is doing? 

What helps a bit is to not run the cpu in parallel but after each other. 
Not because of technical requirements, but this makes clear, that 
the user can follow the process. Perhaps I should describe what the 
worst case scenario is. A bad idea is to remove the artificial waiting 
loop, because this results into a not controllable behavior: the user is 
entering “run”, and without any delay, both cpu are at the end of the 
program. From the performance perspective that is the fastest mode. 
But the user sees nothing. It is useless to run the simulator in that 
mode. 

3.7 Prefetching for the UNIVAC I tapedrive 

Introduction The rewind time of the tape drive was around 60 
seconds, sometimes this is called “access time”, because it takes one 
minute until a certain adress is reached. If the CPU has no core 
memory and no cache, he writes direct on the tape. A command 
sequence like “read #00, read #A0, read #10”, takes each 60 seconds 
until the tape is adjusted to the adress. So the Univac needs 3*60=180 
seconds for the operation. 

Main The UNIVAC I computer was made in 1951. His tapedrive 
had a rewind time of 60 seconds. That is also called the access time. 
6 tapedrive units can be run in parallel. Let’s assume we have a batch 
task. From tapel we copy something to tape2. From tape2 to tape 5 
and so forth. The exact batch is described in the figure 4. Because we 
need 6 read and 6 write access the overall time is 12 minutes until the 
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Figure 4: prefetching for tapedrive 


job is done. The term prefetching means, to not tolerate that a tape 
has nothing to do, instead it is started in advance. From the beginning 
we give every tape a task and this results into a smaller overall time. 
It can be reduced to 2 minutes. The average access time is no longer 
60 seconds, but it is: 

120 seconds overall 

-= 10 seconds per task 

12 tasks 

Can this be true that the virtual rewind time is only 10 seconds, 
while the physical rewind time is 60 seconds? Yes, the term is called 
pipelining or RAID and means, that every device is under load, and 
they are working together. More inexpensive tapedrives will decrease 
the virtual rewind time further. 60 seconds down to 10 seconds is 
equal to 1 tapedrive up to 6 tapedrive. That means, if our goal is an 
accesstime of 1000 msec, we need 60 tapedrives in parallel. 

4 Hardware 

4.1 Parallelcomputing 

Sometimes the Inmos Transputer is called a parallel computer, be¬ 
cause it combines many cpus on one chip. But in reality, the so called 
Harvard architecture is also a parallel machine. The signal pathway 
is used for connecting subcomputers together. Let us describe a small 
example. Suppose the assembly command “mov ax,bx” should be 
executed. For doing so, many substeps are needed. If we are running 
the steps after each other it takes some time. The alternative is to 
construct many small units which are handling one piece of work and 
then we are combining these unit with an assembly line. The assem¬ 
bly line on a cpu is called signal pathway. On youtube an animation 
of a working Intel 8085 is available which shows the principle 13 . It 
looks similar to a Simcity game in which cars are driving around. 
The most dominant feature is, that the signal are independent from 
each other. They are running at the same time. 

Let us describe the situation with a city. At 09:00 AM person 1 is 
driving with his car to school. At the same time at 09:00 AM person2 
is driving with his car to the hospital. At 10:00 AM everybody is on 
his place. The global clock has only recognized that the process has 
taken 1 hour. But in reality, the time for driving was 2x1=2 hours. 

13 https://www.youtube.com/watch?v=nABqe5SctHY 
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Todays cpu’s are described by their speed in Gigahertz. 1 Ghz is 
equal to 10 A 9 operations per second. A very good cpu reaches 10 
instructions per clockcycle. 

Queue machines [2] introduces queue machines on an example. 
We have the formula 

5*(2 + 8))-(4/2) 

and the idea is to calculate the result in parallel. At first, the abstract 
syntax tree is generated, and this tree is executed in parallel. That 
means, 2 + 8 is calculated at the same like 4/2 be different execution 
units which are normal dualstack pushdown-automata. 

Threading In mainstream computing the idea of threading is well 
known. For example, the C++ programming language is supporting 
this feature as default. The reason why threading is used not very 
often has to to do with the fact, that according to the hardware only 
a few cpu cores are available. That means, if the programmer is 
creating 2 threads, this double not only the performance but also the 
power consumption of his CPU. But threading itself is the answer to 
every performance bottleneck. It must be used with many cores at 
the same time. 

Let us first investigate how we can decide if a program can be run 
parallel or not. There are two options, the first one is called automatic 
pipelining and it is used on the hardware side. That means, a chunk of 
assembly code is loaded into the cache of the cpu, and the processor 
is investigating if a task can be run in parallel. This decision is mostly 
wrong and the possible performance improvement is only minimal. 
The other option is to let the programmer decide if his program can be 
run parallel or not. A typical example is a brute-force solver for the 
traveling salesman problem. The programmer can define 10 threads, 
and each one is calculating a piece of the problem. 

Let us describe a situation in which threading can speed up the per¬ 
formance. At first we need many cpus, for example 1000. And then 
we let the programmer decided how to use these cpu in parallel. That 
means, it is not necessary for doing the pipeline step automatically 
on the abstract syntax tree of the program instead the programmer 
decides how to map the work to the hardware. 

In theory, the mapping can be done automatically, for example with 
analyzing the bytecode and decide according to rules which chunk of 
the code can be executed parallel. But, this advanced computing and 
for parallel computing it is not necessary to do so. 

Suppose, we have 1000 different dual-stack pushdown automaton 
arranged in parallel. A normal sourcecode written in Forth can’t 
be executed by the stackmachine cluster, because it is unclear if a 
subroutine needs as input the output of another subroutine. But, if 
the programmer has decided from the beginning, that all routines can 
be done in parallel, there is no need for waiting of the results of a 
previous step. That means, the programmer has scheduled the tasks 
of his “Traveling salesman problem” manual to the computing cores, 
and they are starting to calculate. 

It is correct, that it is difficult or sometimes impossible to run 
sourcecode in parallel. The problem is, that in the sourcecode is 
defined: 

input (takes 10 seconds) 
calcl (take 20 seconds) 
calc2 (takes 60 seconds) 
output (takes 10 seconds) 


and the “output” routine can only be started, if the previous sub¬ 
routines are done. And is indeed an interesting question, if we can 
analyze the sourcecode (or his abstract syntax tree) in way that we can 
automatically decide which commands can be executed in parallel 
and which not. But, this question is only a minor problem in the 
subject of threading. It solves the question what the cpu can do, if the 
programmer didn’t defined manually what codeparts can be threaded. 

The more easier way is, if the programmer has already written 
explicit, that the calc task can be run in parallel: 

input 

parallel: calcl , calc2 
output 

Here is the situation clear. We can assign the calc task to different 
cpus. But, the programmer will only define task as thread ready, if 
the hardware is able to run the task with higher performance. Todays 
computers only have 2 cores, and it makes no sense to specify a 
complex threading hierarchy. 

4.2 Why Forth is better than Intel Xeon Phi 

The latest iteration in Intel products is called Intel Xeon Phi, “Knight 
Mill” According to the specs, the processor can deliver 3.5 Teraflops 
and consumes 300 Watt energy. At first, we must calculate the 
Gigaflops per Watt value, which is 11.7. Then we can calculate 
the energy consumption per flops, which is 85 picojoule per flop 
(l/a*1000). Now we compare this value with the GA144 forth CPU 
by Greenarray, which needs only 7 picojoule per instruction. 14 So we 
have as the result: 

• Intel Xeon Phi, 85 picojoule per flops 

• GA144 Forth CPU, 7 picojoule per Instruction 

The Greenarray chipset is around 12x more energy efficient than 
the Intel hardware, which is a lot. So we have proven that Forth is 
superior to Intel engineering. 

4.3 Pipelining 

Understanding Multiprocessing and Pipelining isn’t easy, but it seems, 
that the parallel executation of many tasks the key is for increasing 
performance of a CPU. In the figure “pipeline” an example is given, 
which is well known from other introductions into the topics. The 
overall tasks consists of steps like “prefetch, fetch, decode and so on”, 
and the tasks are scheduled to threads. The first surprising fact is, 
that a thread is not responsible for a timestep, but a thread is used for 
run the entire task. What does that mean? In C++ we would create a 
cycle with the command from figure “Thread in C++”. That means, 
not every subtask is run by a different thread, but we have only one 
thread, which has the complete game-loop. The idea is, to run many 
game-loops at the same time with a small offset. 

Let us investigate the table a bit in detail. The main problem of the 
fictional cpu is, that the subtasks have an order. That means, we can 
execute something only after the value was decoded. It is not possible 
to change to order, it is a fixed workflow. Let us watch, how the table 
has solved the problem. The “execute” subtask is highlighted in grey. 
What we can see is, that the CPU isn’t run the subtask at the same 

14 http://www.greenarraychips.com/home/documents/greg/PB001-100503-GA144- 
1-10.pdf 


13 



void cycle() { 

prefetch(); 
fetch (); 
decode(); 
execute(); 
free(); 

} 

std::thread tl; 

tl=std::thread(cycle, this); 

t1.detach(); 


Figure 5: thread in C++ 
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Figure 6: pipeline 


time, but with a slow speed after each other. From the perspective of 
“execute”, there is no pipeline, because in every timestep one subtask 
is executed. 

But, the overall figure shows that all the tasks are running in 
parallel. We see a virtual CPU which is vertical in the table. The 
virtual CPU goes from bottom to top and contains all the subtasks: 
prefetch, fetch, decode and so on. And they are executed at the same 
time. Let us go further into the details. The constraint is, that the 
workflow has a certain order. That means, the subtasks “prefetch, 
fetch, decode and so on” can’t be run at the same time, but must 
executed after each other. “Fetch” is only possible if the previous 
step was done. Every sub task needs a certain amount of time, for 
example 1 second. If we have 5 subtasks, with each 1 seconds, then 
it takes 5 seconds for the overall task, right? 

Our aim as the manager of the game is to reduce the duration. Let 
us compare two options. The first one is the sum of the seconds. 6 
threads with each 5 seconds needs 6x5=30 seconds. A look in the 
figure “pipeline” shows us, that it is possible to run the threads only 
in 10 seconds, because the last action is ready in the timestep 10. 
What is the trick? 

At first we can calculate the raw numbers. 10 seconds overall 
time consumption is equal to 1.66 seconds per thread (10/6). That 
means we have increased the speed from 5 seconds for running a task 
downto 1.66 seconds. The same reduction is done for each subtask. 
Prefetch needs no longer 1 seconds, but only 0.33 seconds. Can this 
be true? 

According to the table we have 30 subtasks overall. Each of them 
needs 1 seconds. On the same time, the measured time for executing 
the subtasks is only 10 seconds. And that is equal to 0.33 seconds 
x 30 = 10 seconds. The measurement is correct, the duration of a 
single task has the normal 1 seconds, but it has also 0.33 seconds. 
The second reduced duration is the result of the above mentioned 
virtual CPU. It is something which we have constructed by ordering 


the task in a pipeline. The virtual CPU needs for the prefetch subtask 
only 0.33 seconds, and the same amount for all the other subtasks. It 
is a super-processor which is the result of the coworking threads. 

workpackage In the timestep 5 a batch-job is executed by the 
virtual cpu. The request contains of the following tasks: 

• prefetch instruction #5 

• fetch instruction #4 

• decode instruction #3 

• execute instruction #2 

• free instruction #1 

Because the subtasks are dealing with different instructions, they 
can be run simultaneously. The physical cpu gets this batch-menu. 
The reason why the subtasks are executed is unknown. Prefetching 
instruction #5 has nothing to do with “fetch instruction #4”. 

4.4 A cache for Forth 

Classical mainstream computing contains of RAM, second level 
cache and first level cache. The idea is, that an algorithm - written 
in C - can access fast to every cell of the memory. For example, 
an image with 200 kb is stored in RAM and the CPU is deciding to 
take parts of the data into his first level cache to run a compression 
algorithm on it. What is the Forth way of doing so? 

At first, a Forth CPU has no first level cache. It has only the 
dualstack and the memory. Access to the RAM is expensive and 
storing large amount in the datastack is not possible. How does a 
Forth CPU stores large amount of data? This is unclear, perhaps 
it is not possible. Let us watch a real Forth CPU like the GA144. 
The total amount of RAM on the chip is small. Instead the system 
is optimized for small amount of data. For a normal algorithm this 
could be possible, but what is in the case of large image files, which 
are need 200 kb and more of storage. Where do we are storing them? 

At first, let us investigate an option to overcome the problem with 
traditional methods. The need is to store 200 kb near the CPU, so 
that the CPU can access the data fast. Doing so is possible with a first 
level cache plus a register based machine. The registers of a CISC 
machine can be seen as the zero-level-cache. It is the fastest sort of 
memory direct on the chip. So the answer to the storage problem 
is not using Forth and switch to a modern register cpu including a 
cache. 

This is what we see in reality, but perhaps it is possible with 
Forth to give another answer. A stackmachine is per definition not 
a registermachine, that means, we can not add registers or a cache 
to it to improve the memory. And using a stackmachine together 
with a big RAM makes no sense, because the access to the RAM is 
slow. The Forth way for solving the problem is to use an array of 
stackmachines. That means in reality. A standard Forth CPU consists 
of a datastack which is not deeper than 5 bytes. That is the total 
amount of RAM the system has. If we need more RAM for storing a 
200 kb image, we need more Forth CPUs. 40000 single Forth CPU 
with each 5 bytes = 200 kb RAM. Such a system is called in literature 
a stackbased “Cellular Automaton Machine”. 

This sounds a bit complicated so a short example may clarify the 
point. Suppose the image we want to store is 200 kb in size. We are 
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divide the image in 1000 parts, each is 200 byte long. The 200 byte 
are the memory of a Forth machine. Additionally the CPU has 5 bytes 
for the datastack and 5 bytes for the return stack. So we have 1000 
mini computers which have access to a different part of the memory. 

Each Forth CPU can read/write only 200 bytes. This is some kind of 
intelligent RAM. The only problem is, how to program the tiny little 
devices? 

SFML 


4.5 Forth GPU 

The topic of a graphic card is ignored in the Forth community. The 
reason is, that a standard microcontroller doesn’t have such a device. 
But from the perspective of understanding Forth it makes sense to 
focus in detail on the topic. The application of visualizing pictures 
has a lot of interesting challenges. At first, the usually graphics card 
RAM is big in size. A standard display has at least 1 million pixel 
plus color information (= 1 MB). If the graphics card should render 
3d graphics the number of needed pixels is higher. Also the painting 
of shapes likes lines is very computational intensive. That means, if 
someone believes to understand computing and want to minimize the 
hardware, than building up a graphics card from scratch is a good 
place to start. 

[7] is one of the few paper who are describing Forth together with 
graphics processors. Finding such information is difficult, because 
most papers of the 1980s are on microfiche and not online available 
and searching for the term “Forth” brings in most cases no useful in¬ 
formation. Another example, which uses the GA144 cpu for graphical 
output on a watch is given on github. 15 
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Figure 7: Stackmachine in SFML 
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5 Examples 


5.1 Implementation 


Visualizing a dual-stack pushdownautomaton is relativly easy with 
the SFML game engine. In the following figure the first prototype 
can be seen. Instead of programming another “Forth like system” on 
a x86 cpu this project has the purpose to program against a graphic- 
display. That means, the Forth VM is only visible onscreen as some 
kind of interactive game. The user can so far control the instruction 
pointer with left/right button, and he can even push some values to 
the datastack. It is done with a textcommand. The idea is, that the 
game can interpret a normal Forth program and run it in singlestep 
mode. 



5.2 Testing out SP-Forth 

In the following section, I want to describe a concrete Forth system. 
I’ve choosen the SP-Forth system 16 , because it has the feature to 
compile Forth code into native x86 machine code, which improves 
the performance. The first step is to download the 744 zip file for the 
Linux operating system. After unpacking lots of documentation, ex¬ 
amples and the SP-Forth executable file is visible. The file “./spf4orig” 
is 64kb in size and after executing it, the screenshot “SP-Forth” is 
shown. 

Instead of the Gforth system, SP-Forth needs the command in big 
letters. A demo program can be: 


Figure 8: SP-Forth 


15 https://github.com/mflamer/GA144-watch 
16 http://spf.sourceforge.net/ 
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# hi . f 

: MAIN 10 0 DO I . LOOP ; 

MAIN 

BYE 

# commandline 

. /spf4orig hi.f 


Figure 9: Hello world 


: MAIN 10 0 DO I . LOOP ; 

Ok 

MAIN 

0123456789 Ok 

A problem with SP-Forth is, that the documentation is only avail¬ 
able in Russian. One advantage is, that it supports multitasking out 
of the box. But let us try out something different. We want to run a 
program from a file. At first we need the same Hello World program 
in a file “hi.f”, with a small modification and then we can execute the 
program together with SP-Forth. The result is the same, the numbers 
from 0 to zero are printed on the screen. 

A speed comparison between SP-Forth and C++ is always possible, 
and the SP-Forth system is way more slower, 
end of chapter 


sp-forth: 

: SUM 

0 1000000 0 DO I + LOOP 


: MAIN 1000 0 DO SUM LOOP ; 

MAIN 

BYE 

time ./spf4orig sum.f 
real 0m2,435s 

user 0m2,425s 

sys OmO,005s 


6 Other languages 

6.1 BCPL vs Forth 

In the late 1960s a major decision was made in history of computing. 
It was a showdown between the Forth system and the BCPL language. 
The mainstream in computing decided, that BCPL is a here to stay, 
while the Forth idea was ignored. 

Let us go back in time and investigate what the difference is. BCPL 
was the predecessor to C, and C evolved into C++. Also BCPL is a 
programming language dedicated to register-based cpus. In contrast, 
the Forth VM was oriented on a stackmachine and evolved into factor 
and joy and modern Forth2012 standard. The contrast between both 
developments is massive. While C like language are on top of the 
TIOBE index, is Forth usually on the last place. Both developments 
can be seen as contradiction. 


C++: 

#include <iostream> 
void sum() { 
auto result=0; 

for (auto i=0;i<1000000;i++) 
result+=i; 

std::cout<<result<<" 

} 

int main() 

{ 

for (auto i=0;i<1000;i++) 
sum(); 
return 0; 


g++ -03 sum.cpp 
time ./a.out 
real OmO,137s 

user OmO,132s 

sys OmO,004s 


At first, i want to describe the idea behind Forth. The Forth VM is _. , _ , , 

„ , , , , , Figure 10: benchmark SP-Forth 

using an abstract computer, called two stack pushdown automaton. It 

has a tape, a datastack, a return stack and an execution unit. Program¬ 
ming in Forth is equal to use dataflow programming for the Forth 
virtual machine. In contrast, the idea of BCPL was to define also a 
virtual machine, but the BCPL virtual machine is using 5 registers, 
according to the orginal paper: Accumultor, some index register, 
programcounter and so forth. The idea of the BCPL virtual machine 
evolved into todays computer hardware which is called a CISC ar¬ 
chitecture. Such systems were programmed in C like languages like 
Python, C, C++ and Java. 

Which architecture is more powerful? For answering this question 
let us describe what the disadvantages are. Suppose we want to 
program in a C like language with lots of variables and compile the 
sourcecode to a stackbased processor. It is very difficult to do so. 

Mapping a high level programming language onto a stackmachine is 
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not possible. Instead what the hardware manufactorers are doing is the 
other way around. They are modifying the cpu to support highlevel 
programming languages. That means in reality they are increasing 
the number of registers to at least 32, and they are increasing the 
number of machine opcodes for supporting multimedia and other 
operations. 

Now we want describe in which the above cited CISC architecture 
is not very good. If the aim is to research computer programming and 
hardware design on a formel level, than are languages like C not the 
best idea. The reason is, that it is not clear how many registers the 
cpu has, or what the standard of a language is. A c like programming 
language can be anything, and a BCPL like virtual maschine too. It 
is not possible to describe algorithm on a formal level. 

What we seen today is both. Computer scientists are loving Forth 
based stackmachines, because they are so simple and can be seen 
as a runtime version of a turing-machine. A virtual machine in 
stackmachine architecture is a general computer and is well suited 
for analyzing the runtime behavior of an algorithm. It is also possible 
to create new hardware with it. 

On the other hand, C like programming languages and the under¬ 
lying CISC based hardware is very common in realiy. They can be 
used for implementing real operating systems like Linux and write 
software in business environment. 

The major difference between BCPL and Forth is, that the BCPL 
architecture has evolved over time. Today, the C language looks 
different and the CPU which is executing the commands too. That 
means, the AMD64 architecture is very different from the 386’er 
systems from Intel. And todays C++ is another language than the 
earlier BCPL language. In contrast. Forth like systems looks the same 
over decades. In the 1970s the Forth virtual machines consists of two 
stacks and today nothing has changed. 

In bad literature, not a stackmachine but the turing machine is 
described as a universal model for a computer. But a turing machine 
is less important. The problem with turing machines is, that the 
program code is separated from the tape. Usually, the tape represents 
the input and the runable code is located in the head of the turing- 
machine. A second problem with turing-machines is, that the code 
in head is not changable, it is a Finite state machine. In contrast, the 
Forth like VM is the better model. It can also run every algorithm but 
can programmed with subroutines and variables. 

Often there is a misconception out there. The idea is, that it is 
possible to program in Forth like in C++. The idea is, that Forth is 
simply another programming language. But it is more. Forth has 
more in common with a dataflow language for a abstract machine. 
That means it is not possible to program Forth without interactive. 
Perhaps an example: 

A C++ programmer is oriented on the sourcecode. He writes down 
2 pages of sourcecode like he write a text. And after 30 minutes, he 
presses the run button to see what his program his doing. It is batch 
workflow, and if the programmer has experiance, the time between 
every run of the program is longer. That means, the programmers 
sees only from the sourcecode what his program will do. He uses the 
compiler only for error checking. 

In contrast a Forth programmer works interactivly. What his ma¬ 
chine will do is unclear, because new defined Forth words are not 
predictable. The idea is not to program something, but to prove if a 
program is correct. Let us describe the difference in detail. 

C++ programs are written because the user needs software. He 
writes for example a game, and if the sourcecode is complete the 


computer can do something interesting. That means, C++ program¬ 
ming is done only for creating sourcecode. The idea is to increase the 
number of software. 

Forth programming is done with a different goal. Forth is mainly a 
teaching language. The idea is not to program with Forth a database 
application or a game, the idea is to use Forth as a tool for educating 
students about computer science. That means, the typical Forth 
projects has not produced anly lines of code after 1 year, but around 
100 students with a diploma in computer science who are familiar 
with hardware and algorithms. Instead, C++ is not very well suited to 
teach students in computerscience. C++ is a language for the practice. 
It has no clean design, and is driven by learning from doing. 

Anti pattern At the end of the small comparison between BCPL 
and Forth I want to describe a behavior which results into failure. It is 
an antipattern which is ignoring the advantages of a language. What 
the user is not recommended to do is using Forth as a programming 
language for creating an operating system or a GUI application. Also 
not recommended is the idea, to use BCPL for teaching computer 
science at the university. It will fail also. 

BCPL can be described as a language for writing computer source- 
code, while Forth is an education tool for teaching students. That 
means, if a professor at the university, uses Forth to explain what a 
compiler is, then the professor is really good in his subject. And if a 
programmer in daily life is using a BCPL like language for creating 
software, then is also an expert in his domain. 

6.2 C++ vs Forth 

The difference between Forth and C++ can not be greater. Both 
languages are very old. The development of C++ started in the 1970s 
with the BCPL compiler which were later extended with Simula-like 
features. The stackbased Forth language was developed in the same 
time. Both languages can be seen as unique, that means they are very 
powerful. But it is not possible to switch between them, because their 
philosophy is different. 

At first I want to describe the mindset between C++. C++ is 
a very fast language, which compiles on standard-processor into 
productive code. The OOP features of C++ makes it easier to program 
large applications, maintain them and use many programmers in the 
same project. That makes C++ the number 1 language for game 
development. A typical game consists of 10000 different sprites, 
which different behavior which are interacting with the game-engine. 
C++ handles this complexity and allows the programmer to create 
sophisticated programs. 

The Forth language was designed with a different purpose. Their 
main application is in science-classes at the university. Forth is 
a minimalist language. It is possible to explain the working of a 
complete CPU, an operating system and a compiler with Forth. The 
idea behind Forth is not to grow program complexitat, but to reduce 
it. If a Forth program has 2 kb in sourcecode, a much better program 
occupies only the half. The advantage of Forth is, that researcher 
can try out on a theoretical level many kind of architectures. For 
example, if they are parsing Forth sourcecode into an abstract syntax 
tree (AST) they can investige which parts of it can be run in parallel. 
Creating the abstract syntax tree for a C++ is way more complicated. 

The best practice method is to use Forth and C++ together. C++ 
as a compiler for creating real applications. And Forth as a research 
plattform to play around with new algorithm, computer architectures 
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and compiler designs. Let me give one example. :Suppose the idea 
is, to make a performance test of a new parallel computer. If we 
are ignoring Forth and focus entirely on C++ and Intel CPU we 
would take a standard-C++ program which has 100 MB sourcecode, 
a standard Operating System like Linux and a standard X86 Processor 
which has 5 billion transistors. We are wiring all these compentents 
together and investige who to run it in parallel. Will this setup works? 
Never! There are to many components in the loop, it is a mess. 
The better approach is, to write in Forth a small CPU. program a 
operating system from scratch and run some applications on it. The 
overall project has around 2000 lines of Code in Forth and it easy 
to inspecting every single piece of it. If something goes wrong it is 
possible to go down to the transistor to search for the failure. 

Let us investigate some antipattern. If someone is trying to use 
Forth for build a production application with 1 million lines of code, 
he has made a mistake. The project will fail horrible. The reason 
is, that Forth do not support OOP. Another problem is, that the 
performance of Forth is poor. Another Antipattern is, to use C++ 
in a computer course for explaining what a compiler is. C++ is to 
complicated and the knowledge right now is in 6 months obsolete. 

Forth advantage Let us watch a homebrew CPU project which 
isn’t aware of Forth. 17 is a CPU entirely made of transistors. The 
system was build for educational reasons, and perhaps the inventor 
had a lot of fun with it. According to the schematic diagram it is a 
register based machine, which is equal to modern computers, which 
also have registers. The result is, that the above cited homebrew 
project is more complicated than it should be. The instruction set is 
high, and the wiring to complicated. Forth can solve the issue. Let us 
investigate the idea in detail. 

A minimal computer uses fixed logicgates and it is not pro¬ 
grammable. The term in the literature for such device is Finite state 
machine. That means, the transistors are ordered to a logic circuit and 
a input signal results into an output signal. Such device can not be 
programmed, so it is not a real computer. The next step after a wired 
Finite state machine is not a register based computer, but between 
them comes the Stackmachine. That is a minimal computer which is 
described by the Forth community in detail. It is like a Finite state 
machine, turing capable, but can execute a program. The program 
can be changed so it is possible to program the device. 

The open question until now is: how to build out of transistors 
a stackmachine which runs Forth? The answer is not given. Most 
homebrew projects are devoted to register based cpu architecture. 
But Forth can provide a realy minimalistic device. In that sense, 
that it is not able to build a computer more minimalistic as Forth. 
The Megaprocessor consists of 15300 transistors for the CPU and 
additonally 27000 transistor for the RAM. The overall transistorcount 
is higher than it should be. The J1 Forth CPU for example, but i do 
not need to explain the details here. 

Let us first describe, what in the Megaprocessor project was done 
right. The general idea to build a cpu from tranistors in an educational 
computer is quite good. So the students can understand what a 
computer is on a very basic level. What was done wrong in the 
project, is the architecture of the CPU itself. A switch from a register 
machine to a stackmachine would increase the educational value for 
the public. 

I think the overall question can be answered with the transistor 
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count. The intel 4004 for example had around 2300 transistors. I 
would suggest, that it is possible to reduce the amount. 

Forth disadvantages After some applause for Forth it is time to 
describe the disadvantages. Forth was inventend around the year 
1970, until now it wasn’t used on real softwareprojects like Operating 
systems, games or network programming. Instead a different architec¬ 
ture was successful in the mainstream, which is called register-based 
CPU plus C programming language. The reason why has to do with 
the need for bigger software projects. The major problem in comput¬ 
ing is not to improve the hardware, or make it more faster. The major 
problem is the software crisis. That means, around the 1970s modem 
computers from this time had enough RAM and enough harddrive 
capacity. The problem was, that no software were available. And 
until today, this problem is not solved. That means, a cheap PC brings 
around 4 GB RAM with it, and 200 GB harddrive, but how to fill the 
space? 

The computing history has answered the problem with three ideas: 
compiler languages like C, objectoriented programming like C++ 
and Service-oriented architecture. The idea is to abstract from a 
concrete machine and separate between hardware and software. Let 
us investigate how compiler languages like C are implemented on 
a CPU. Programming such compilers is done on register machines. 
They have special registers and context-switching possibilities. The 
concept is called “compiler friendly” and means, that the CPU has to 
adapt to the compiler with the purpose to support high-level-language 
programming. A Forth system can’t adapt to anything. It is not 
possible to run a C program on a stackmachine. That is the reason, 
why Forth is not used in real software projects, especially not in big 
softwareprojects which are working with object oriented program¬ 
ming. 

Is it possible to create large applications will million lines of code 
in Forth? No. Is it possible to replace large projects with smaller 
programs? No. The reason why the Linux operating system consists 
of million lines of code and not only of 10000 is, because Linux is so 
powerful. What we have seen in computer history is a double strategy. 
On the one hand, we had register-machines, C, C++ for real life big 
projects, and on the other hand the Forth language was developed 
into a teaching language for simplified lesson in cpu architecture. 

Let us describe the above cited service oriented architecture in 
detail. On a technical level it is equal to a REST services on the 
internet and to a abstract class on a local machine. The basic idea 
is, to create a layer which hides complexity and provide certain 
interfaces to the outside world. For example, somebody has written a 
10000 lines long program which is doing something. Service oriented 
architecture means, that he programs on top of the program in addition 
400 more lines of code, which have the only reason to provide a clean 
number of functionnames. The additional 400 lines of code have no 
direct task, they are from the program itself useless. But there are 
important if the program is used as a library which works together 
with other libraries. Such feature is enabled in the C++ framework as 
default, and it is the reason why the language is used so widespread. 
And yes, such a feature results into bloatware. That is software which 
is large, and is doing nothing. 

6.3 C Compiler for Forth? 

[9] describes a C compiler which targets a stackmachine. The idea is, 
that writing programs in Forth is difficult and it would make sense to 
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use existing C sourcecode and run a Forth machine. Doing so is not 
easy, and the paper has failed to realize such a compiler. But let us 
go into the details. 

The compiler which was used is LCC. The reason is, that LCC 
makes it easy for writing a special backend for a stackmachine. The 
idea is, to make “stack scheduling”, that means to use the stack like 
register and put C variables on the stack. The problem is, that in 
reality a stack and a register are different things. For example, if a 
variable is put deep in the stack on position 12, it is not possible to 
access the variable anymore because according to the definition only 
the top element of a stack can be retrieved. 

The problem is on the side of the c programmer. He writes perhaps 
a subroutine which uses 5 local variables, for storing temporary 
information, the result or whatever. The compiler is trying to convert 
the sourcecode into machine code. On a register machine it is not a 
problem, because the local variables are stored in the registers which 
makes it fast to access them. On a stackmachine such registers are 
not available. What the compiler can do, is to to store the variables in 
the memory. But, if the CPU wants to access it, it takes to much time. 

To make the point a bit clear: It is not possible to write a c-compiler 
for a stackmachine. It will be always slower. That is the reason 
why mainstream computing is devoted to register machines, because 
such machine fits better to c programming and C++ programming. 
Apropos C++. How difficult it is to map objects to a stackmachine? 
Really difficult. 

What is the alternative? The alternative is not use existing C pro- 
gramcode but writing new sourcecode in Forth. Forth subroutines 
are using usually no local variables and they run very fast on a stack- 
machine. There is only a minor problem: the number of sourcecode 
written in Forth is small. 

We can see the lack of C compiler for stackmachines as an advan¬ 
tage, because it results into an environment which is different from 
mainstream computing. In productive softwaresystem everybody 
is using C like languages which runs best on x86 CPUs. And for 
educational purposes the other way around is the gold standard. That 
means to use a stackmachine together with Forth. The absence of C 
compiler marks the border. There is a difference between Forth and 
the rest in computing. [9] has proven, that it is not possible to sit in 
between.. The user must decide between Forth and C. 

6.4 Beginners are chosen Assembly language, ex¬ 
perts C++ 

It is a surprising fact, that people who have no programming expe¬ 
rience at at all are preferring Assembly language for their first trial. 
The expectation might be the difference, because programming in 
Assembly is called demanding and writing an operating system from 
scratch is outside the scope of amateurs. But in reality, many of them 
are doing exactly this. The reason is, that from a certain point of view. 
Assembly is a beginner friendly language. The ordinary program is 
very short and is optimized for a certain type of architecture. If some¬ 
one has invested one week into reading some books about Assembly, 
he can start writing his first graphics demo and even program his first 
game. 

There is only one problem: Assembly is not used by professionals. 
The reason is, that they are not interested in writing short programs 
for a certain CPU type. Instead the profi programmers is writing 
longer programs with million of lines of code, and he is doing so 
under existing operating system. This results into a different program¬ 


ming culture. That means, apart from the stereotype C/C++ is the 
hardest programming language ever invented. Not because it is so 
difficult to store a number in a variable, but because of the problems, 
C++ programmers are trying to solve. They are not interested in 
getting 10% more performance out of a certain loop, instead C++ 
programmers try to fix a database application or the write a game 
against a given game engine. 

In contrast Assembly programming has very much to do with the 
language itself. Which instruction set is used, how to iterate through 
a loop or how to make certain tricks on AMD64 PC. Both types 
of programmers are interacting with the computer, but Assembly 
programmers are doing so in an beginner friendly version. That 
means, they are not solving any real issues, instead they are putting 
graphics on the screen or they playing around with the interrupt. 
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